{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3c1a8915-4f4c-4584-b464-f720fc8eb6f7",
   "metadata": {},
   "source": [
    "# Similarity calculation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "dbe20206-ad36-4f80-857c-d28ce5b9cca7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "import openai\n",
    "openai.api_key = 'your_api_key_here'\n",
    "\n",
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "from sklearn.feature_extraction.text import TfidfVectorizer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bedb3b9-54aa-429f-b97e-e1c71b3f72a1",
   "metadata": {},
   "source": [
    "## 4.1 Cue-Target Similarity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "94653ced-4045-4966-8a2e-5f0e4a297adb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the prompts and the story\n",
    "prompts = [\n",
    "    \"The Italian Restaurant 'Napoli'\",\n",
    "    \"An Eatery\",\n",
    "    \"Mr Jones\"\n",
    "]\n",
    "\n",
    "story = \"One of the reviews was randomly selected. The selected review is positive. It was provided by Luigi. He and his friend had a wonderful experience at the restaurant. They both ordered pizza. It was expertly prepared in Neapolitan style, and the mozzarella tasted extremely fresh. Luigi was impressed by the authentic taste that reminded him of his holiday in Naples, Southern Italy. For dessert they ordered the restaurant's favorite, special Italian Tiramisu, which was mouth-watering. After Luigi had paid, the waiter served a traditional Italian drink, Limoncello, that Luigi had never heard of before and loved. As they left the restaurant, Luigi was very happy and thought to himself \\\"I'll be back!\\\"\"\n",
    "\n",
    "# Combine prompts and story into a single list for TF-IDF vectorization\n",
    "corpus = prompts + [story]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ca185809-993f-49b0-853e-985ac3dd8530",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Italian Restaurant 'Napoli' -> 0.30826374450691224\n",
      "An Eatery -> 0.0\n",
      "Mr Jones -> 0.0\n"
     ]
    }
   ],
   "source": [
    "# Initialize the TF-IDF Vectorizer\n",
    "vectorizer = TfidfVectorizer()\n",
    "\n",
    "# Fit and transform the documents\n",
    "tfidf_matrix = vectorizer.fit_transform(corpus).toarray()\n",
    "\n",
    "# Compute the cosine similarity between each prompt and the story\n",
    "for i, prompt in enumerate(prompts):\n",
    "    print(prompt, '->', cosine_similarity(tfidf_matrix[[-1]], tfidf_matrix[[i]])[0][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1c1c5dff-e1b6-4ace-91ab-24b662e30e26",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Italian Restaurant 'Napoli' -> 0.5253772456093638\n",
      "An Eatery -> 0.27504836385725057\n",
      "Mr Jones -> 0.09962151565278252\n"
     ]
    }
   ],
   "source": [
    "# Generate embeddings using the OpenAI API\n",
    "response = openai.Embedding.create(\n",
    "    input=corpus,\n",
    "    model='text-embedding-3-large'\n",
    ")\n",
    "gpt_embeddings = np.array([embedding['embedding'] for embedding in response['data']])\n",
    "\n",
    "# Compute the cosine similarity between each prompt and the story\n",
    "for i, prompt in enumerate(prompts):\n",
    "    print(prompt, '->', cosine_similarity(gpt_embeddings[[-1]], gpt_embeddings[[i]])[0][0])"
   ]
  },
  {
   "cell_type": "raw",
   "id": "7194cc76-e629-408c-be26-47c4ac0f5032",
   "metadata": {},
   "source": [
    "We validate these intuitions by computing the cosine-similarity between each of the prompts and the story. Using a tf-idf vectorization with the three prompts and the story as corpus, we obtain a similarity with the story 0.31 for \"The Italian Restaurant 'Napoli'\", against 0.00 for both \"An Eatery\" and \"Mr Jones\". Using a vectorization based on OpenAI's state-of-the-art embedding model, text-embedding-3-large, we obtain a similarity with the story 0.52 for \"The Italian Restaurant 'Napoli'\", against 0.28 for \"An Eatery\" and 0.10 for \"Mr Jones\"."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "614f8eef-41cc-4693-8c6f-372de9bc4dd2",
   "metadata": {},
   "source": [
    "## 4.2 Similarity of Cue to Non-Target Information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "585902b2-1c67-40a6-bc57-7755b45940a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "stories_dict = {\n",
    "    \"High Similarity\": {\n",
    "        \"Food truck\": \"One of the reviews was randomly selected. The selected review is positive. It was provided by Justin, who had a hot dog at the food truck and loved it. A sign claimed to serve ``a bite of heaven for just a few bucks.'' The juicy sausage hissed and sizzled on the grill as delicious aromas filled the air. Once golden brown, it was nestled inside a slightly toasted bun, soft as a cloud. The toppings were a gourmet surprise: caramelized onions simmered in bourbon, creamy avocado mayo, spicy jalapeno relish, and a sprinkle of crumbled feta cheese. Justin's first bite was an epiphany. The sausage was perfectly seasoned, with a hint of smokiness, while the toppings complemented each other perfectly -- the sweetness of the onions, the creaminess of the mayo, the tang of the relish, and the salty kick of feta. Justin savored every bite of that hot dog. It was an unexpected gourmet experience that was nothing short of legendary.\",\n",
    "        \"Sports stadium\": \"One of the reviews was randomly selected. The selected review is negative. It was provided by Darren, who had a hot dog at the stadium and hated it. A sign claimed to serve ``the pinnacle of flavor for mere pennies.'' The shriveled hot dog cracked and smoked on the grill, creating a revolting smell. Once charred black, it was slammed inside a rock-hard bun, dry as desert sand. The toppings were a nasty shock: overripe relish oozing with slime, rancid garlic mayo, wilted lettuce, and a sprinkle of stale blue cheese. Darren's first bite was pure regret. The hot dog tasted burnt beyond belief, while the toppings clashed in an awful way -- the sourness of the relish, the bitterness of the mayo, the blandness of the lettuce, and the moldy hint of cheese. Darren regretted every bite of that hot dog. It was a disgusting culinary experience that was nothing short of a disaster.\",\n",
    "        \"Amusement park\": \"One of the reviews was randomly selected. The selected review is negative. It was provided by Lucas, who tried a hot dog in the amusement park and was shocked. A sign boasted ``unforgettable taste for a dime.'' The skinny hot dog shriveled and popped on the grill, releasing odors that turned heads away. Once burnt to a crisp, it was carelessly thrown into a stale bun, crumbly and old. The toppings were an unfortunate surprise: soggy sauerkraut dripping with excess water, overly pungent mustard, limp pickles, and a dab of cream cheese gone bad. Lucas' initial bite was one of dismay. The hot dog tasted like rubber, and the toppings jumbled into a mess of sensations -- the wateriness of the sauerkraut, the overpowering punch of the mustard, the lifelessness of the pickles, and the sourness of the cheese. Lucas could hardly finish that hot dog. It was a culinary disaster that was memorably underwhelming.\"\n",
    "    },\n",
    "    \"Low Similarity\": {\n",
    "        \"Food truck\": \"One of the reviews was randomly selected. The selected review is positive. It was provided by Justin, who had a hot dog at the food truck and loved it. A sign claimed to serve ``a bite of heaven for just a few bucks.'' The juicy sausage hissed and sizzled on the grill as delicious aromas filled the air. Once golden brown, it was nestled inside a slightly toasted bun, soft as a cloud. The toppings were a gourmet surprise: caramelized onions simmered in bourbon, creamy avocado mayo, spicy jalapeno relish, and a sprinkle of crumbled feta cheese. Justin’s first bite was an epiphany. The sausage was perfectly seasoned, with a hint of smokiness, while the toppings complemented each other perfectly -- the sweetness of the onions, the creaminess of the mayo, the tang of the relish, and the salty kick of feta. Justin savored every bite of that hot dog. It was an unexpected gourmet experience that was nothing short of legendary.\",\n",
    "        \"Sports stadium\": \"One of the reviews was randomly selected. The selected review is negative. It was provided by Darren, who attended a football game in a sports stadium and left deeply frustrated. A banner boasted ``unparalleled experience for true fans.'' The seating was cramped and creaked with every move, eliciting whispered complaints from spectators. Once seated, he strained to get a decent view, his line of sight blocked by a poorly placed pillar. The misgivings were manifold: an overhead screen that flickered intermittently, the blaring of mismatched commentary, unexpected seat vibrations, and a finale of a spilled drink from the row above. Darren's enthusiasm waned rapidly. The stadium, instead of amplifying the football game, detracted from it, with one annoyance after another -- the obstructed view, the distorted sound, the jarring vibrations, and the sticky mess on his back. Darren regretted attending that match. It was a sporting experience that was disappointingly off-mark.\",\n",
    "        \"Amusement park\": \"One of the reviews was randomly selected. The selected review is negative. It was provided by Lucas, who visited the amusement park and was utterly disappointed. A sign falsely promised ``adventures beyond imagination for thrill-seekers.'' Once strapped in, he was elevated to uncomfortable heights, making the rest of the park look tiny and run-down in the distance. The experiences were underwhelming: a dark, dimly lit tunnel, the abrasive gust of wind, stomach-churning drops, and an unexpected, chilling water splash at the end. Lucas' heart filled with regret. The roller coaster was a jarring blend of unease and dismay, and the elements combined into a confusing mess -- the dimness of the lights, the nausea from the descent, the jolt of the unexpected, and the cold splash at the end. Lucas wished he could forget every moment of that visit. It was a forgettable misadventure that marked a low point in his summer.\"\n",
    "    }\n",
    "}\n",
    "\n",
    "prompt = \"Food truck\"\n",
    "\n",
    "corpus = [prompt] + [v for t in stories_dict.values() for v in t.values()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "646ef20e-f198-4b12-b38c-a5ce3cbd2bfc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "High Similarity Food truck -> 0.07558789634895732\n",
      "Low Similarity Food truck -> 0.07558789634895732\n",
      "High Similarity Sports stadium -> 0.0\n",
      "Low Similarity Sports stadium -> 0.0\n",
      "High Similarity Amusement park -> 0.0\n",
      "Low Similarity Amusement park -> 0.0\n"
     ]
    }
   ],
   "source": [
    "# Initialize the TF-IDF Vectorizer\n",
    "vectorizer = TfidfVectorizer()\n",
    "\n",
    "# Fit and transform the documents\n",
    "tfidf_matrix = vectorizer.fit_transform(corpus).toarray()\n",
    "\n",
    "# Compute the cosine similarity between each prompt and the story\n",
    "for j, story in enumerate(['Food truck', 'Sports stadium', 'Amusement park']):\n",
    "    for i, treatment in enumerate(['High Similarity', 'Low Similarity']):\n",
    "        print(treatment, story, '->', cosine_similarity(tfidf_matrix[[0]], tfidf_matrix[[1+3*i+j]])[0][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8219e50a-48cb-4d4c-ad55-86b7f450110a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "High Similarity Food truck -> 0.3610890059089156\n",
      "Low Similarity Food truck -> 0.3622965323667513\n",
      "High Similarity Sports stadium -> 0.22291910623955402\n",
      "Low Similarity Sports stadium -> 0.0823883816048509\n",
      "High Similarity Amusement park -> 0.25861663127943824\n",
      "Low Similarity Amusement park -> 0.13147158996689257\n"
     ]
    }
   ],
   "source": [
    "# Generate embeddings using the OpenAI API\n",
    "response = openai.Embedding.create(\n",
    "    input=corpus,\n",
    "    model='text-embedding-3-large'\n",
    ")\n",
    "gpt_embeddings = np.array([embedding['embedding'] for embedding in response['data']])\n",
    "\n",
    "# Compute the cosine similarity between each prompt and the story\n",
    "for j, story in enumerate(['Food truck', 'Sports stadium', 'Amusement park']):\n",
    "    for i, treatment in enumerate(['High Similarity', 'Low Similarity']):\n",
    "        print(treatment, story, '->', cosine_similarity(gpt_embeddings[[0]], gpt_embeddings[[1+3*i+j]])[0][0])"
   ]
  },
  {
   "cell_type": "raw",
   "id": "2cb9e0fe-2149-4ebb-858d-595c52e993cf",
   "metadata": {},
   "source": [
    "We compute the cosine-similarity between the prompts and the non-target stories across the two treatment arms with the help of a large-language model. Using a vectorization based on OpenAI's state-of-the-art embedding model, text-embedding-3-large, yields an average similarity of the prompt with the stories in High Interference of 0.25 compared to 0.11 in Low Interference. This validates our intuitions about the similarity relationships."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
